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AN ADAPTIVE CONJUGATE GRADIENT ALGORITHM 
WITH CLUSTERING THE SINGULAR VALUES OF THE 
SEARCH DIRECTION MATRIX FOR LARGE-SCALE 
UNCONSTRAINED OPTIMIZATION 


Neculai ANDREI! 


Abstract. An adaptive nonlinear conjugate gradient algorithm based on clustering the 
singular values of the search direction matrix and on the inexact Wolfe line search is 
presented. The search direction is dependent to a positive parameter. The value of this 
parameter is selected in such a way that the singular values of the matrix defining the 
search direction are clustered around 1. We prove that for general nonlinear functions 
and independent of the line search procedure the search direction satisfies both the 
sufficient descent condition and the Dai and Liao conjugacy condition. According to the 
value of the parameter, the algorithm uses the suggested search direction, or it triggers to 
the Hestenes and Stiefel direction. Under classical assumptions, for uniformly convex 
functions, we prove that the algorithm is globally convergent. Using a set of S00 
unconstrained optimization test problems we prove that our algorithm is significantly 
more efficient and more robust than CG-DESCENT algorithm and slightly more efficient 
and more robust than ADCG algorithm. By solving five applications from the 
MINPACK-2 test problem collection, with 10° variables, we show that the suggested 
adaptive conjugate gradient algorithm is top performer versus CG_DESCENT. 


Keywords: Unconstrained optimization, Conjugate gradient algorithms, Wolfe conditions, 
Singular values clustering, Sufficient descent condition 


1. Introduction 
Let us consider the unconstrained optimization problem 


min{ f(x),x €R"}, (1.1) 


where f:R" —R is continuously differentiable and bounded below. For solving this 
problem we suggest a nonlinear conjugate gradient algorithm, where the iterates x,, 
k =0,1,... are generated as 


Xp = Xe FAA» (1.2) 
the stepsize @, is positive and the search directions d, are computed as: 


diy = Bien TG, Sis d, =—8o> (1.3) 
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wy _ Ye Ses all St Sixt 
B= eas aa (1.4) 
Ve Sk Ve Se Ve Se 
where | is Euclidian norm, g, =Vf(%,). Y, =8pu —8n> Sy =X —% and @, is a 


positive parameter which follows to be determined. Observe that (1.4) is very close to the 
conjugate gradient parameter of Hager and Zhang conjugate gradient algorithm [22], where 
@, = 2. In this paper we are interested to determine a value for the parameter @, in such a 
way to get an efficient and robust conjugate gradient algorithm able to solve large-scale 
unconstrained optimization problems. 

Observe that, if f is a quadratic function and the step length @, is selected to achieve the 


exact minimum of f in the direction d,, then s/g,,,=0, i-e., the formula (1.4) for 7," 
reduces to the Hestenes and Stiefel [24] (HS) scheme. However, in this paper we consider 
general nonlinear functions and inexact line search based on Wolfe conditions [33, 34]: 


F(X, +a,d,) < f (4) + 60, 80d), (1.5) 


esd 208, d), (1.6) 
where 0<d<o<l. 


We see that the parameter 73,” defined in (1.4) can be viewed as a modification of the HS 
conjugate gradient algorithm. Observe that if the step length @, is computed according to 
the Wolfe line search conditions (1.5) and (1.6), then YS, >0O. Therefore, along the 
iterations, when the step s, is small (in norm), the factor y, in the numerator of 
HS = yl g. ./ y.s, tends to zero. On the other hand, when the step s, is small, again the 
factor y, in the numerator the second term of 7," tends to zero. Hence, £3," becomes 
small and the new search direction d,,, is essentially a small alteration of the steepest 
descent direction —g,,,. In other words our method automatically adjust £3," to avoid or 
at least to attenuate jamming, which is the main defect of the steepest descent direction. 

It is worth saying that a conjugate gradient method related to our computational scheme 
(1.3) and (1.4) is that given by Dai and Liao [12], in which the parameter Be in (1.3) is 
replaced by: 


1 
a =a <1) Beas 


Yak (LF) 


where f > 0 is a constant parameter. For different choices of t, the computational scheme 
of Dai and Liao generates different results. An optimal value for ¢ in this algorithm is not 
known (see [3]). Observe that the method (1.3) and (1.4) can be viewed as an adaptive 


version of (1.7) where f, at each iteration, is updated as t = @, lly, I /(y;,S;): 
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The purpose of this paper is to find a value of the parameter @,, in such a way to get an 


efficient and robust conjugate gradient algorithm. For this we suggest using the structure 
of the singular values of the matrix associated to the search direction (1.3) and (1.4). 


Using (1.4) in (1.3) the search direction in our algorithm is computed as: 


Ye Bea [ye ee 
dpe = -Bat 7 See T. Sk? 
Ve Se Ve Sk Ve Sk (1.8) 


where @, iS a positive parameter. Now, considering the Perry [28] idea the search 
direction (1.8) can be represented as: 


Diy = Ay Sea (1.9) 
where 
2 
H age ly. | 5,8 
k+l T k oT Py 
Ve Sx VS Ve Sy (1.10) 


Observe that H,,, is a sum of a symmetric matrix J + ally, |’ /(y,8,)' |s,5; and a non- 
symmetric one s,y,/y,5,, ie., H,,, is a non-symmetric matrix. Therefore, (1.9) 
represents an ad hoc algebraic expression of the search direction d,,, in which the non- 
symmetric matrix H,,, is not an approximation to the inverse Hessian V7 f(x,,,)'. It is 
this algebraic form of the parameter 3", given by (1.4), which leads us to this expression 
of H,,,. In other words, strictly speaking, (1.9) is not a real, quasi-Newton direction. 

In this point, to define the algorithm the only problem we face is to specify a suitable 
value for the positive parameter @, in (1.8). The approach used here is to determine the 
value of the parameter @, in (1.8) in such a way to minimize the condition number of the 
iterate matrix H,,, in (1.9). In other words, the idea of this paper is to determine the 
value of the parameter @, in order to achieve more numerical stability in computation of 
the search direction (1.9). The effect of ill-conditioning of H,,, on the iterative algorithm 
(1.2) using the search direction (1.9) can be explained as follows (see also [8, 9]). For a 
vector ve R", let us denote fl(v)=[fl(v,),.... fl(v,)]’ as a vector in R", where fl(v,), 


i=l,...,n, is the nearest floating point number to v,. Using (9) we have 
My) = —Ay Spa) k=0O,1..... 
Therefore, 


[AG — ys = ||-Area (MS) "a Siu) 
s Ae s[llACged — Skat |. 
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Now, if the matrix H,,, 1s nonsingular, it follows that 


| (S141) = Sxl > | (dew) —d,..| 
[seu le eniden 


(des) 7; Aya | ; 
4. 


Therefore, the following inequality between the relative errors of d,,, and g,,, can be 
established: 


2 1 
Ae 


ial 


| Ade) me dys | 


|S) ~ Sku | 
lesa 


<n) Igual 
Sk+ (1.11) 


where «(H, w=|2 k all is the condition number of H,,,. Therefore, if the 


-1 
Ai 


iteration matrix H,,, 1s ill-conditioned, then even for small values of the relative error of 
&,.;> the relative error of the search direction d,,, may be large. In other words, if 
k(H,,,) is large, then the system (1.9) could be very sensitive to perturbations in g,,). 
The idea of this paper is to select a value for the parameter @, in (1.10) in such a way to 
minimize the condition number of the iteration matrix H,,,. Minimizing the condition 


number of the iteration matrix in conjugate gradient algorithms have also — been 
considered by Babaie-Kafaki and Ghanbari [8, 9] for Dai-Liao nonlinear conjugate 
gradient algorithm, or by Liu and Xu [25] for Perry descent conjugate gradient algorithm, 
leading them to efficient and robust conjugate gradient algorithms. 


The structure of the paper is as follows. The algorithm and its properties are presented in 
Section 2. We prove that the search direction used by this algorithm satisfies both the 
sufficient descent condition and the Dai and Liao conjugacy condition, independent of the 
line search. The parameter @, in the search direction (1.8) is determined by minimizing the 
condition number of the iteration matrix H,,,, 1.e., by clustering the singular values of this 


matrix around to 1. Using standard assumptions, Section 3 presents the global convergence 
of the algorithm for uniformly convex functions. In Section 4 the numerical comparisons of 
our algorithm versus CG-DESCENT [23] and ADCG [5] conjugate gradient algorithms are 
presented. The computational results, for a set of 800 unconstrained optimization test 
problems, show that this new algorithm substantially outperforms CG-DESCENT, and is 
slightly more efficient and more robust than ADCG. Considering five applications from the 


MINPACK-? test problem collection [7], with 10° variables, we show that our algorithm is 
way more efficient and more robust than CG-DESCENT. 


2. The algorithm 

An important property of our conjugate gradient algorithm is that the search direction 
(1.8) always yields descent when y/s, #0, a condition which is satisfied when f is 
strongly convex, or the step length @, is computed according to the Wolfe conditions. 
The following properties of the search direction (1.8) can immediately be proved. 
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Theorem 2.1. [f the step length a, in (1.2) is determined by the Wolfe line search 
conditions (1.5) and (1.6) and @, 21/4, then the search direction (1.8) satisfies the 
sufficient descent condition 


fads {1-2 fea <0. 


(2.1) 
Proof. Since d, =—g,, it follows that gj d, = -lg,|° <0. Hons) weet 
T T 
Sid ME 2, + O% Sry Sus) QO, bal (s; Br) 
VS yy, Sy y,. S, (2.2) 


Now, we apply the inequality u’v <— (lel? + jl to the second term in (2.2) with 


u= 


(5,8 
fa NR PRI OE —— 
2a, and Y=" [2e, (s) B44), 


to obtain 
2 
Ce Sea) Sea) _ OK Cede SMS; Siu) - 1 2 ly. | (5. Sev) 
T [een | +O, T 2 : 
Ve Sk G; i ~ 4a, (y;, 5, ) 


Therefore, introducing this estimation in (2.2) we get (2.1) showing that the search 
direction (1.8) satisfies the sufficient descent condition when 


@, 1/4. 


a 
If f is strongly convex or the line search satisfies the Wolfe conditions (1.5) and (1.6), 
then y/s, >0 and our computational scheme yield descent. Note that if @, >1/4, then 
2,.,d,,, is bounded by —(1-1/ 4a, 
for example, of Dai and Yuan [14, 15] only the negativity of - x, 18 established. 


2 3 zi $ 
, while in some other computational schemes, 


We note in passing that if @, = 2, then from (2.1) g. pit <= * like in [22]. 


Another important property of the search direction (1.8) is that it satisfies the Dai and 
Liao conjugacy condition [12], which addresses to the inexact line search, but reduces to 


the old conjugacy condition y/d, =0 when the line search is exact. 


Theorem 2.2. Consider @,>0 and the step length a, in (1.2) is determined by the 
Wolfe line search conditions (1.5) and (1.6). Then the search direction (1.8) satisfies the 
Dai and Liao conjugacy condition y{d,,,,=—V, (5, 84.,), where v, = 0. 
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Proof. By direct computation we have 


2 
Vp Ay =| MO, bf (5, Be =, Bead) 


2 

I> 
-s 

Ve Sk 


where v, =, By Wolfe line search conditions (1.5) and (1.6) it follows that 


YS; >0O, therefore 


v, >0. = 


In the following we are interested to specify a procedure for @, computation in (1.8) 
based on minimizing the condition number of the iterate matrix H,,,, given by (1.10). 


For this, we briefly present the singular value analysis and the condition number of a 
matrix. The following definitions and theorems, taken from Watkins [32], clarify some 
aspects of this concept of condition number of a matrix. 


Theorem 2.3. [32] Let Ac R”” be a nonzero matrix with rank r. Then, R” has an 
orthonormal basis v,,...,V,,, R" has an orthonormal basis u,,...,u,, and there exist the 


>”m? 


scalars 0,20, 2-+:-20, >0 such that 


x ea i=],...,7, Ar i i=l,...,7, 
a u. = 


: 0, i=rtl,...,m, 


and (2.3) 


Definition 2.1. The scalars o,,...,0, from the theorem 2.3 are called the singular values 
of the matrix A. 


naxm 


Based on the Theorem 2.3, for any nonzero matrix Ae R’™” with rank r it follows that 


Al =O, +...4+0%, (2.4) 


where I. represents the Frobenius norm. If r=m=n, then 


|det(A)| =O, XO, X+'XO,,. (2.5) 
As we mentioned, a very important concept in the sensitivity analysis of numerical 
computations with matrices is the matrix condition number. A matrix with a large 
condition number is called an ill-conditioned matrix since the computations with this 
matrix are potentially very sensitive to changes in data of the problem involving this 
matrix. 


Definition 2.2. For an arbitrary nonsingular matrix A, the scalar x(A)=|A||4“| is 


called the condition number of A. 
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Theorem 2.4. [32] If AcR"” is a nonsingular matrix with the singular values 
0, 20,2::-20, >0, then K(A)=0,/o,,. 


Definition 2.3. The condition number «(A) computed as above is called the spectral 
condition number. 

In our analysis we need to find the singular values of the matrix H,,,. For this, in our 
developments we assume that y,s, >0, which is guaranteed by the Wolfe line search 
conditions (1.5) and (1.6). The structure of the singular values of the matrix H,,, is given 
by the following theorem. 

Theorem 2.5. Suppose that the step length a, is determined by the Wolfe line search 
conditions (1.5) and (1.6). Let H,,, be defined by (1.10). Then H,,, is a nonsingular 


matrix and its singular values consist of 1 (n—2 multiplicity), o{,, and 0;,,,, where: 


of, = 5| Vea +1)? +(a, -1) +(@,a, - 1 + (a, =|, 


(2.6) 
Gia *| Yaa, +1) +(a, -1) —,|(@,a, -1)’ + (a, =|, 
2 (2.7) 
and 
a, = Is Dy >i. 
(¥% 5) (2.8) 


Proof. By the Wolfe line search conditions (1.5) and (1.6) we have that 4S > 0. 
Therefore, the vectors y, and s, are nonzero vectors. Let V be the vector space 
spanned by {s,, y,}. Clearly, dim(V) <2 and dim(V~) >n-—2. Thus, there exist a set 


of mutually unit orthogonal vectors {u/,}"> CV~ such that 
5, Uy = Yi Uy =0, ad RN eee 
which from (1.10) leads to 
Hn t= le sae. 
Therefore, the matrix H,,, has n—2 singular values equal to 1. 


Now, we are interested to find the rest of the two remaining singular values denoted as 
o;,, and o;,,,, respectively. From the formula of algebra (see for example [31]) 


det(I + pq’ +uv") =(1+q" p)(t+v'u)—(p'v)\(q"u), 
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2 
where p =-t, G=Y U=Q, mi C s, and v=s,, it follows that 
Ve Sx Ve Se 
2 2 
det(H,,,) = @, Z [ be] =@,a,; 
YES (2.9) 


where a, is given by (2.8). 
Since @, >0 and a, >1, it follows that H,,, is a nonsingular matrix. 
Now, by direct computation we get: 


2 


tr(H,H,,,.)=n-2+a,+a@,a;. (2.10) 


+1 
Since Fall, =tr(H{,,H,,,), from (2.4) we get 

(Of) + (Op) = a FOR. (2.11) 
Also, from (2.5) and (2.9) we have 
Oh = HA: (2.12) 


Now, from (2.11) and (2.12), after some simple algebraic manipulations we obtain: 


+ - _ 2:32; 
Onn + On = Jaa; + 20,0, + Ay » (2.13) 


Therefore, from (2.12) and (2.13), the remaining singular values o;,, and o;,,, of H,,, 
are the roots of the following quadratic polynomial 


o ~ Jaa? +20,a,+4,0+ @,a, =0. (2.14) 
Clearly, the other two singular values of the matrix H,,, are determined from (2.14) as 
(2.6) and (2.7) respectively. 


Observe that a, >1 follows from Wolfe conditions and the inequality 


T 2 
Vi Sx < Vx 
| 2 ys 

: | 


Observe that since a, >1 it follows that the singular values o;,, and o;,,, are well 
defined by (2.6) and (2.7), respectively. 


The following two proposition prove some important properties of the singular values 


i S 
Oj, and O;,)- 
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Proposition 2.1. For the singular value o,,,, defined in (2.7), we have o,,, <1. 


Proof. Observe that if @,a,<1, then since 0;,,<0;,,, from (2.12) we have that 

O,,, <1. On the other hand, if @,a, 21, then from (2.6) we have: 
¢ <1 1 

O;,, 2 —(@,a, +1) +—(@,a, —1) = @,a,. 

2 2 (2.15) 


With this, from (2.12) it follows that 71 SI 


Proposition 2.2. For the singular value o;,,, defined in (2.6), we have o;,, =1. 


Proof. As in Proposition 2.1 above, if @,a, =1, then from (2.15) we have that o;,, >1. 
On the other hand, if @,a, <1, then from (2.6) we have 


1 1 
Char 25 (OA ++ 5-qy) = 1. : 


Now, since a, >1, from (2.12) and (2.13) it follows that both o;,, and o;,, are positive. 
Therefore, from the above propositions we have 0<o;,,, <1<o;j,,. From Theorem 2.4 


‘ 
o 
K(H,..)=—. 


we have that (2.16) 


k+l 


As we have mentioned in Section | in order to enhance the numerical stability in the 
search direction computation, it is reasonable to determine the value of the parameter @, 


in (1.8) by minimizing the condition number of H,,,. In a simple computational scheme, 
from (2.16) we see that minimizing «(H,,,) 1s to minimize the distance between o;,,, 
and o;,,. Therefore, the optimal value of @,, denoted @,, is determined as: 

@, = arg min(O;,, — O41)» (2.17) 
thus making o;,, as close as possible to o;,,,;. Since 0<o;,,, <1<0;,;, it follows that 
@, solution of (2.17) makes «(H,,,)=1 as close as possible to 1. From (2.6) and 

«1 
(2.7), a simple algebraic development shows that © = a (2.18) 
k 
where a, >1 is given by (2.8). Therefore, for @, =1/a, the singular values of H,,, are 
clustered around 1. Notice that for @, =1/a, the matrix H,,, from (1.10) becomes: 


T T 
— SKYE SSK 


k+l T va 
Yee [sel 
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In the following, from Theorem 2.1, we observe that the necessary condition for the 
sufficient descent condition of the search direction is @, 21/4. Therefore, the condition 


for minimizing «(/,,,) is a, <4. Now, we can define our algorithm as follows. 


If a, <4, then we select @, =1/a, in (1.8) in order to achieve both the sufficient descent 
condition and minimizing the condition number «(H,,,). Otherwise, the algorithm uses 
the Hestenes and Stiefel direction. Since the search direction (1.8) has the property of 
sufficient descent for any value @, 21/4, it follows that for any value of a, <7, where 
1<z<4 is a parameter, the singular values of the matrix H,,, are clustered around 1. 
Therefore, the search direction of our algorithm is given by (1.3) where the parameter 
B” is computed as: 


T T 
Ye Seu — Ske Ses 


= 5. Ab sap ST, 
YeSe se 


i= 
‘ yg 
PKS eH if a,>T. 


T: 
Vek (2.19) 


Our algorithm (1.3) with (2.19) can be considered as an adaptive conjugate gradient 
algorithm subject to the parameter 1<7<4. If a,>z, then the search direction is 


triggered to the HS direction, otherwise the search direction is that specified in (1.8) with 
@,=1/a,, where a, is given by (2.8). We see that according to the value of the 


parameter 7 the behavior of our algorithm is closer to that of the HS algorithm, or to the 
algorithm given by (1.8) where @, =1/a,. 


Observe that our algorithm is a modification of the HS conjugate gradient algorithm 
based on the idea of minimizing the condition number of the matrix defined by the search 
direction (1.3) and (1.4). The CG-DESCENT algorithm proposed by Hager and Zhang 
[22] also is a modification of the HS conjugate gradient algorithm by ex abrupto deleting 
a term from the search direction for the memoryless quasi-Newton scheme of Shanno 
[30]. Again, using this approach we get a value for the parameter f in the Dai and Liao 
conjugate gradient parameter (1.7) for which the condition number of the search matrix is 
minimized. Taking into consideration the above developments and using the procedure of 
acceleration of conjugate gradient algorithms presented in [2], the following algorithm 
can be presented. 


NCG Algorithm (New Conjugate Gradient Algorithm) 


Select a starting point x, € R” and compute: f(x,), 8) =Vf(%). Select some 
Step 1. positive values for 6 and o used in Wolfe line search conditions. Consider a 
positive value for the parameter 7. (1<7 <4) Set d, =—g, and k =0. 


Test a criterion for stopping the iterations. If this test is satisfied, then stop; 


Step 2. : ; ‘ 
P otherwise continue with step 3. 
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Step 3. Determine the steplength a, by using the Wolfe line search (1.5) and (1.6). 
Step 4. Compute z=x,+a@,d,, g,=Vf(z) and y, =8,—8,. 


Step 5. Compute: a =a@,g/d, and b, =a, y,d,. 

Acceleration scheme. If b, > 0, then compute é, =—a, /b, and update the 
Step 6. variables as x,,, =x, +¢,a@,d,, otherwise update the variables as 

Xp, =X, + a,d,. 
Step 7. Compute a, as in (2.8). 


Step 8. Compute the search direction as in (1.3) where £3," is computed as in (2.19). 


Step 9. Powell restart criterion. If gine > 0.2\|o.41 ee da = Beare 


Step 10. Consider k =k +1 and goto step 2. | 


If function f is bounded along the direction d,, then there exists a stepsize a, satisfying 


the Wolfe line search (see for example [17] or [29]). In our algorithm when the Beale- 
Powell restart condition is satisfied, then we restart the algorithm with the negative gradient 
—g,,,;- More sophisticated reasons for restarting the algorithms have been proposed in the 
literature [13], but we are interested in the performance of a conjugate gradient algorithm 
that uses this restart criterion associated to a direction which satisfies both the descent and 
the conjugacy conditions. Under reasonable assumptions, the Wolfe conditions and the 
Powell restart criterion are sufficient to prove the global convergence of the algorithm. The 
first trial of the step length crucially affects the practical behavior of the algorithm. At every 


iteration k 21 the starting guess for the step @, in the line search is computed as 


Oe ll@.||/¢;- For uniformly convex functions the linear convergence of the 


acceleration scheme used in the algorithm NCG is proved in [2]. Clearly, the acceleration 
scheme improves the performances of the algorithm [2]. Numerical comparisons may 
drastically change by introducing acceleration. However, we are interested to see the 
performances of this algorithm equipped with an acceleration scheme. 


3. Global convergence analysis 


The global convergence analysis of the above algorithms is based on bounding the norm of 
the search direction, (see Gilbert and Nocedal, [19], Nocedal, [27] or Dai, et al [16]). In this 
section we prove the global convergence of the above algorithms under the following 


assumptions. Assume that:The level set S = {x ER": fiosf (x))} is bounded. 


(i) Ina neighborhood N of S the function f is continuously differentiable and 
its gradient is Lipschitz continuous, i.e. there exists a constant L>O such 
that I|VF (x) —Vf (y)| < L|x -y 


, forall x,yEN. 
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Since {f(x,)} is a decreasing sequence, it is clear that the sequence {x,} generated by 
the proposed algorithm NCG is contained in $. Under these assumptions on f there 


exists a constant I'>0 such that Vf (x)||<T for all xeS. Notice that the assumption 


that the function f is bounded below is weaker that the usual assumption that the level 
set is bounded. 


Although the search directions generated by the algorithm are always descent 
directions, to ensure convergence of the algorithm we need to constrain the choice of 


the step-length @,. 


The following proposition shows that the Wolfe line search always gives a lower bound 
for the stepsize @,. 


Proposition 3.1. Suppose that d, is a descent direction and the gradient Vf satisfies the 
Lipschitz condition 


|| Vf (x) — Vf (x,)|] $ Lx —~, | 


for all x on the line segment connecting x, and x,,,, where L is a positive constant. If 
the line search satisfies the Wolfe conditions (1.5) and (1.6), then 


(l-o)|g7d,| 
t= eae 
Ld, 


Proof. Subtracting gid , from both sides of (1.6) and using the Lipschitz continuity we 
get 


(o-lgid, S (Bia =@,) 4, = yd, Ss Iya. Ss a,L|\d, || 
Since d, is a descent direction and o <1, we get the conclusion of the proposition m= 


For any conjugate gradient method with strong Wolfe line search the following general 
result holds [27]. 


Proposition 3.2. Suppose that the above assumptions hold. Consider a conjugate 
gradient algorithm in which, for all k 20, the search direction d, is a descent direction 


and the stepsize a, is determined by the Wolfe line search conditions. If 


arn 


>= 
k=0 d,| 


(3.1) 
then the algorithm converges in the sense that 


liminf |g, |= 0. ay 
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For uniformly convex functions we can prove that the norm of the direction d,,, 


computed as in (1.3) with (2.19) is bounded above. Therefore, by proposition 3.2 we can 
prove the following result. 


Theorem 3.1. Suppose that the assumptions (i) and (ii) hold. Consider the algorithm 
NCG where the search direction d, is given by (1.3) and Be is computed as in (2.19). 
Suppose that a, is computed by the Wolfe line search. Suppose that f is a uniformly 
convex function on S, i.e. there exists a constant 11>0 such that 
2 
(Vf (x) —Vf(y))" (x y) = tlle —y| (3.3) 
forall x,yeEN. Then 


Hime, | =0 a 


Proof. From Lipschitz continuity we have ly, | < L|s, |. On the other hand, from uniform 


convexity it follows that y/s, = hls, |’. Now, using (2.19) in (1.3) for a, <7, we have 


T 
Is! Skat 
2 
Is. 


showing that (3.1) is true. Again, using (2.19) in (1.3) for a, >7 it follows that 


ye Ben 


ills, Isle bse Lr 
T 
Ve SK 


2 2 <2F+—, 
“ls, Is. u 


lanl [eal Is. [+ Is. [<0 + 
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showing that (3.1) is true. By proposition 3.2 it follows that (3.2) is true, which for 
uniformly convex functions is equivalent to (3.4) 


The convergence analysis for general nonlinear functions follows the developments given 
by Hager and Zhang [22]. If the level set S is bounded, the Lipschitz condition 


VF (x) -—Vf (y)| < L|x- y| holds and the step length satisfies the Wolfe conditions (1.5) 
and (1.6), then for the algorithm (1.2), (1.3) and (2.19) either g, =O for some k or 
liminf, ,.,,|g,|=0 (see theorem 3.2 in [22]). 


ko | 


4. Numerical results and comparisons 


The NCG algorithm was implemented in double precision Fortran using loop unrolling of 
depth 5 and compiled with f77 (default compiler settings) and run on a Workstation Intel 
Pentium 4 with 1.8 GHz. We selected a number of 80 large-scale unconstrained 
optimization test functions in generalized or extended form presented in [1]. For each test 
function we have considered 10 numerical experiments with the number of variables 
increasing as nm =1000,2000,...,10000. The algorithms compared in this section use the 
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Wolfe line search conditions with cubic interpolation [31], 6=0.0001, o=0.8 and the 
same stopping criterion | gy LL. <10°, where lL is the maximum absolute component of 
a vector. 

Since, CG-DESCENT [23] is among the best nonlinear conjugate gradient algorithms 


proposed in the literature, but not necessarily the best, in the following we compare our 
algorithm NCG versus CG-DESCENT. 


When the algorithms are compared we can consider at least two points of view: the first is 
based on the optimal point generated by the algorithm, and the second one is using the 
objective function value in this point. 


Since all the algorithms used and compared in this paper generate local solutions, we 
compare them by using the point of view based on the objective function value in the 
point determined by the algorithms. 

Therefore, the comparisons of algorithms are given in the following context. Let 
fi" and f°? be the optimal value found by ALG1 and ALG2, for problem 
i=1,...,800, respectively. 


We say that, in the particular problem i, the performance of ALG1 was better than the 
performance of ALG2 if: 


ALG1 ALG2 =3. 
i i | oe (4.1) 
and the number of iterations (#iter), or the number of function-gradient evaluations (#fg), 
or the CPU time of ALGI was less than the number of iterations, or the number of 
function-gradient evaluations, or the CPU time corresponding to ALG2, respectively. 
Possibly, some other points of view for comparing the algorithms can be used, but in this 
paper we consider this one. 


Of course, the test problems where the algorithms do not converge to the same function 
value, according to criterion (4.1), are discarded from comparisons. 


Figure 1 shows the performance profiles of Dolan-Moré [18] subject to CPU time metric 
for different values of parameter z. That is, for each method, we plot the fraction of 
problems for which the method is within a factor of the best time. The left side of the 
figures gives the percentage of the test problems for which a method is the fastest; the 
right side gives the percentage of the test problems that are successfully solved by each of 
the methods. Clearly, the top curve corresponds to the method that solved the most 
problems in a time that was within a factor of the best time. Form figure 1, for example 
for t=1.1, comparing NCG versus CG-DESCENT with Wolfe line search, subject to the 
number of iterations, we see that NCG was better in 618 problems (i.e. it achieved the 
minimum number of iterations for solving 618 problems), CG-DESCENT was better in 
98 problems and they achieved the same number of iterations in 53 problems, etc. Out of 
800 problems, we considered in this numerical study, only for 769 problems does the 
criterion (4.1) hold. 
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Fig. 1. NCG versus CG-DESCENT for different values of 7. 


From figure 1 we see that for different values of the parameter zs NCG algorithm is more 
efficient and more robust than CG-DESCENT. In comparison with CG-DESCENT, on 
average, NCG appears to generate better search direction. We see that this computational 
scheme based on clustering the singular values of the matrix representing the search 
direction (1.3) and (2.19) lead us to a conjugate gradient algorithm which substantially 
outperforms the CG-DESCENT, being way more efficient and more robust. In the second 
set of numerical experiments we compare NCG versus ADCG algorithm [5]. 
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The ADCG is an adaptive conjugate gradient algorithm where the search direction is 
computed as the sum of the negative gradient and a vector determined by minimizing the 
quadratic approximation of the function f at the current point. Using a special 
approximation to the inverse Hessian of the objective function, which depend by a 
positive parameter, a search direction is obtained which satisfies both the sufficient 
descent condition and the Dai-Liao’s conjugacy condition. The parameter in the search 
direction is determined in an adaptive manner by minimizing the largest eigenvalue of the 
matrix defining it in order to cluster all the eigenvalues. The search direction in ADCG 
algorithm is computed as 


dia = Ores8eu (4.2) 


where 


ar: 
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and the parameter @, is determined in such a way to cluster all its eigenvalues. In [5] the 
parameter @, is computed as: 
T 
VE Sk 


yx 


O, = 1, 


(4.4) 


where 


if a, 2 


— 


ee I]y, 
(4.5) 


a, is defined by (2.8), and 7>1 is a positive constant. Therefore, the ADCG algorithm 


is based on clustering the eigenvalues of the search direction matrix (4.3). On the other 
hand, the NCD algorithm is using the clustering of the singular values of search direction 
matrix (1.10). Observe the differences between H,,, given by (1.10) used in NCG 


algorithm and Q,,, given by (4.3) used in ADCG algorithm. We see that 


O,., =H, +y,5, /y,8,- Both these matrices H,,, and Q,,, are not symmetric 
matrices, as usual in quasi-Newton methods. They are used in these algorithms in order to 
find the values of parameter @, to cluster the singular values of H,,, or the eigenvalues 
of Q,,,, respectively. In [5] we have the computational evidence that ADCG is not 
sensitive to the values of the parameter 7, and is way more efficient and more robust 
than CG-DESCENT. 


In Figure 2 we present the performance profiles of Dolan-Moré subject to CPU time 
metric, of NCG versus ADCG, for different values of the parameters 7 and 7. 
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Fig. 2. NCG versus ADCG for different values of 7 and 7. 


54 Neculai Andrei 


The NCG algorithm is based on minimizing the condition number of the matrix defining 
the search direction, i.e., on clustering the singular values around 1. On the other hand, 
the ADCG algorithm is based on clustering the eigenvalues of the same matrix. In Figure 
2 we have the computational evidence that NCG algorithm is slightly more efficient and 
more robust than ADCG for any combination of parameters + and 7. From Figure 2 we 
see that both algorithms are not sensitive to the values of these parameters. Practically, all 
performance profiles have the same allure for any combination of z and 7. Singular 
values analysis in designing conjugate gradient algorithms is more profitable subject to 
efficiency and robustness, but this is not overwhelming, both concepts (singular values 
and eigenvalues) leading to very similar results. (see also [4]). 


In the following, in the third set of numerical experiments, we present comparisons 
between NCG and CG-DESCENT conjugate gradient algorithms for solving some real 
applications from the MINPACK-2 test problem collection [7]. In Table 1 we present 
these applications, as well as the values of their parameters. 


Table 1 
Applications from the MINPACK-2 collection. 


Al Elastic—plastic torsion ([20], pp. 41-55), c=5 


A2 | Pressure distribution in a journal bearing [11], b=10, ¢=0.1 


A3__| Optimal design with composite materials [21], 2 = 0.008 


A4 | Steady-state combustion ([6], pp. 292-299), [10], 2=5 


AS5 Minimal surfaces with Enneper conditions ([26], pp. 80-85) 


The infinite-dimensional version of these problems is transformed into a finite element 
approximation by triangulation. Thus a finite-dimensional minimization problem is 
obtained whose variables are the values of the piecewise linear function at the vertices of 
the triangulation. The discretization steps are nx =1,000 and ny =1,000, thus obtaining 


minimization problems with 1,000,000 variables. 


Table 2 
Performance of NCG versus CG-DESCENT. 1,000,000 variables. ts = 4, CPU seconds. 
NCG CG-DESCENT 
#iter #fg cpu #iter #fg cpu 

Al 1113 2257 351.62 1145 2291 474.64 
A2 2843 5714 1143.97 3370 6741 1835.51 
A3 4725 9494 2754.26 4814 9630 3949.71 
A4 1413 2864 2014.17 1802 3605 3786.25 
AS 1270 2566 571.45 1225 2451 753.75 

TOTAL 11364 22895 6835.47 12356 24718 10799.86 
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A comparison between NCG _ (Powell restart criterion, Vf (IL, <10°, 
0 =0.0001, 7=0.8, t=4) and CG-DESCENT (version 1.4, Wolfe line search, default 
settings, 
we see that, subject to the CPU time metric, the NCG algorithm is top performer and the 
difference is significant, about 3964.39 seconds for solving all these five applications. It is 
worth saying that intensive numerical experiments for solving the applications from 
MINPACK-2 collection with different values of the parameter 1<z <4 mainly yield similar 
results concerning the numerical performances of NCG algorithm. In all cases, for all these 
numerical experiments, NCG was top performer versus CG-DESCENT. The NCG and CG- 
DESCENT algorithms (and codes) are different in many respects. Since both of them use the 
Wolfe line search (however, implemented in different manners), these algorithms mainly differ 
in their choice of the search direction. The search direction d,,, given by (1.3) where the 


Vf (x)IL, <10°, ) for solving these applications is given in Table 2. Form Table 2, 


parameter 73," is computed as in (2.19) is more elaborate: it is adaptive and the singular values 


of the matrix defined by it are clustered around 1. In addition it satisfies both the descent 
condition and the conjugacy condition in a restart environment. 


5. Conclusions 


A new adaptive conjugate gradient algorithm based on singular values study of the search 
direction matrix has been presented. The idea of this paper is to generalize the search direction 
of CG-DESCENT conjugate gradient algorithm of Hager and Zhang [22] by introducing a 
positive parameter w, instead of constant 2 used in conjugate gradient parameter By 7. At the 
same time, the paper contains a development for a value of the positive parameter ¢ used in 
conjugate gradient parameter £,°" from the Dai-Liao’s conjugate gradient algorithm [12]. The 
value of this parameter is computed in such a way that the condition number of the matrix 
defining the search direction is minimized. Mainly, in our algorithm, minimizing the condition 
number of the iteration matrix defining the search direction reduces to determine the value of 
the parameter to minimize the distance between the singular values of the corresponding 
matrix, i.e., to cluster the singular values around 1. It is proved that the search direction satisfies 
both the sufficient descent condition and the Dai-Liao’s conjugacy condition. Thus, the 
algorithm is a conjugate gradient one. To satisfy both the clustering of the singular values and 
the sufficient descent condition an adaptive scheme is used which depend by a positive 
parameter. The algorithm is not sensitive to the value of this parameter. The stepsize is 
computed using the classical Wolfe line search conditions with a special initialization. In order 
to improve the reducing the values of the objective function to be minimized an acceleration 
scheme is used. For uniformly convex functions, under classical assumptions, the algorithm is 
globally convergent. Numerical experiments and intensive comparisons using 800 
unconstrained optimization problems of different dimensions and complexity proved that this 
conjugate gradient algorithm is way more efficient and more robust than CG-DESCENT 
algorithm [23], and slightly more efficient and more robust than ADCG algorithm [5]. In an 
effort to see the performances of this conjugate gradient algorithm we solved five large-scale 
real nonlinear optimization applications from MINPACK-2 collection, up to 10° variables, 
showing that NCG is clearly more efficient and more robust than CG-DESCENT. 
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